The first session will focus on reading in the data after
steinbock segmentation (see Session 2).
We use the imcRtools
package to read in single-cell data extracted using the
steinbock framework. During image processing you will also
obtain multi-channel images and segmentation masks that can be read into
R using the cytomapper
package.
library(imcRtools)
library(cytomapper)
For single-cell data analysis in R the framework. It
allows standardized access to (i) expression data, (ii) cellular
metadata (e.g. cell type), (iii) feature metadata (e.g. marker name) and
(iv) experiment-wide metadata. For an in-depth introduction to the
SingleCellExperiment container, please refer to the SingleCellExperiment
class.
The SpatialExperiment
class is an extension of the SingleCellExperiment class. It
was developed to store spatial data in addition to single-cell data and
an extended introduction is accessible here.
To read in single-cell data generated by the steinbock
framework the imcRtools package provides the
read_steinbock function. By default, the data is read into
a SpatialExperiment object; however, data can be read in as
a SingleCellExperiment object by setting
return_as = "sce". All functions presented in this tutorial
are applicable to both data containers.
The downloaded example data (see data)
processed with the steinbock
framework can be read in with the read_steinbock function
provided by imcRtools. For more information, please refer
to ?read_steinbock.
spe <- read_steinbock("data/steinbock/")
spe
## class: SpatialExperiment
## dim: 40 47859
## metadata(0):
## assays(1): counts
## rownames(40): MPO HistoneH3 ... DNA1 DNA2
## rowData names(11): channel name ... Final.Concentration...Dilution
## uL.to.add
## colnames: NULL
## colData names(8): sample_id ObjectNumber ... width_px height_px
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : Pos_X Pos_Y
## imgData names(1): sample_id
By default, single-cell data is read in as
SpatialExperiment object. The summarized pixel intensities
per channel and cell (here mean intensity) are stored in the
counts slot. Columns represent cells and rows represent
channels.
counts(spe)[1:5,1:5]
## [,1] [,2] [,3] [,4] [,5]
## MPO 0.5751064 0.4166667 0.4975494 0.890154 0.1818182
## HistoneH3 3.1273082 11.3597883 2.3841440 7.712961 1.4512715
## SMA 0.2600939 1.6720383 0.1535190 1.193948 0.2986703
## CD16 2.0347747 2.5880536 2.2943074 15.629083 0.6084220
## CD38 0.2530137 0.6826669 1.1902979 2.126060 0.2917793
Metadata associated to individual cells are stored in the
colData slot. After initial image processing, these
metadata include the numeric identifier (ObjectNumber), the
area, and morphological features of each cell. In addition,
sample_id stores the image name from which each cell was
extracted and the width and height of the corresponding images are
stored.
head(colData(spe))
## DataFrame with 6 rows and 8 columns
## sample_id ObjectNumber area axis_major_length axis_minor_length
## <character> <numeric> <numeric> <numeric> <numeric>
## 1 Patient1_001 1 12 7.40623 1.89529
## 2 Patient1_001 2 24 16.48004 1.96284
## 3 Patient1_001 3 17 9.85085 1.98582
## 4 Patient1_001 4 24 8.08290 3.91578
## 5 Patient1_001 5 22 8.79367 3.11653
## 6 Patient1_001 6 25 9.17436 3.46929
## eccentricity width_px height_px
## <numeric> <numeric> <numeric>
## 1 0.966702 600 600
## 2 0.992882 600 600
## 3 0.979470 600 600
## 4 0.874818 600 600
## 5 0.935091 600 600
## 6 0.925744 600 600
The main difference between the SpatialExperiment and
the SingleCellExperiment data container in the current
setting is the way spatial locations of all cells are stored. For the
SingleCellExperiment container, the locations are stored in
the colData slot while the SpatialExperiment
container stores them in the spatialCoords slot:
head(spatialCoords(spe))
## Pos_X Pos_Y
## 1 468.5833 0.4166667
## 2 515.8333 0.4166667
## 3 587.2353 0.4705882
## 4 192.2500 1.2500000
## 5 231.7727 0.9090909
## 6 270.1600 1.0400000
The spatial cell graphs generated by steinbock
are read into a colPair slot of the
SpatialExperiment (or SingleCellExperiment)
object. Cell-cell interactions (cells in close spatial proximity) are
represented as “edge list” (stored as SelfHits object).
Here, the left side represents the column indices of the “from” cells
and the right side represents the column indices of the “to” cells. We
will later see how to visualize the spatial cell graphs.
colPair(spe, "neighborhood")
## SelfHits object with 257116 hits and 0 metadata columns:
## from to
## <integer> <integer>
## [1] 1 27
## [2] 1 55
## [3] 2 10
## [4] 2 44
## [5] 2 81
## ... ... ...
## [257112] 47858 47836
## [257113] 47859 47792
## [257114] 47859 47819
## [257115] 47859 47828
## [257116] 47859 47854
## -------
## nnode: 47859
Finally, metadata regarding the channels are stored in the
rowData slot. This information is extracted from the
panel.csv file. Channels are ordered by isotope mass and
therefore match the channel order of the multi-channel images.
head(rowData(spe))
## DataFrame with 6 rows and 11 columns
## channel name keep ilastik deepcell Tube.Number
## <character> <character> <numeric> <numeric> <numeric> <numeric>
## MPO Y89 MPO 1 NA NA 2101
## HistoneH3 In113 HistoneH3 1 1 1 2113
## SMA In115 SMA 1 NA NA 1914
## CD16 Pr141 CD16 1 NA NA 2079
## CD38 Nd142 CD38 1 NA NA 2095
## HLADR Nd143 HLADR 1 NA NA 2087
## Target Antibody.Clone Stock.Concentration
## <character> <character> <numeric>
## MPO Myeloperoxidase MPO Polyclonal MPO 500
## HistoneH3 Histone H3 D1H2 500
## SMA SMA 1A4 500
## CD16 CD16 EPR16784 500
## CD38 CD38 EPR4106 500
## HLADR HLA-DR TAL 1B5 500
## Final.Concentration...Dilution uL.to.add
## <character> <character>
## MPO 4 ug/mL 0.8
## HistoneH3 1 ug/mL 0.2
## SMA 0.25 ug/mL 0.05
## CD16 5 ug/mL 1
## CD38 2.5 ug/mL 0.5
## HLADR 1 ug/mL 0.2
When not using steinbock, the single-cell information
has to be read in from custom files. We now demonstrate how to generate
a SpatialExperiment object from single-cell data contained
in individual files. As an example, we use files generated by
steinbock.
First we will read in the single-cell features stored in a CSV file:
library(readr)
cur_intensities <- read_csv("data/steinbock/intensities/Patient1_001.csv")
cur_regionprobs <- read_csv("data/steinbock/regionprops/Patient1_001.csv")
dim(cur_intensities)
## [1] 3567 41
dim(cur_regionprobs)
## [1] 3567 7
colnames(cur_intensities)
## [1] "Object" "MPO" "HistoneH3"
## [4] "SMA" "CD16" "CD38"
## [7] "HLADR" "CD27" "CD15"
## [10] "CD45RA" "CD163" "B2M"
## [13] "CD20" "CD68" "Ido1"
## [16] "CD3" "LAG3 / LAG33" "CD11c"
## [19] "PD1" "PDGFRb" "CD7"
## [22] "GrzB" "PDL1" "TCF7"
## [25] "CD45RO" "FOXP3" "ICOS"
## [28] "CD8a" "CarbonicAnhydrase" "CD33"
## [31] "Ki67" "VISTA" "CD40"
## [34] "CD4" "CD14" "Ecad"
## [37] "CD303" "CD206" "cleavedPARP"
## [40] "DNA1" "DNA2"
colnames(cur_regionprobs)
## [1] "Object" "area" "centroid-0"
## [4] "centroid-1" "axis_major_length" "axis_minor_length"
## [7] "eccentricity"
These files contain single-cell features including the cell
identifier (Object), morphological features, the cells’
locations (centroid-0 and centroid-1) and the
mean pixel intensity per cell and per channel
(cur_intensities).
Now we will extract the relevant entries from the files.
counts <- cur_intensities[,-1]
meta <- cur_regionprobs[,c("Object", "area", "axis_major_length", "axis_minor_length")]
coords <- cur_regionprobs[,c("centroid-1", "centroid-0")]
From these features we can now construct the
SpatialExperiment object.
library(SpatialExperiment)
spe2 <- SpatialExperiment(assays = list(counts = t(counts)),
colData = meta,
sample_id = "Patient1_001",
spatialCoords = as.matrix(coords))
Next, we can store the spatial cell graph generated by
steinbock in the colPairs slot of the object.
Spatial cell graphs are usually stored as edge list in form of a CSV
file. The colPairs slot requires a SelfHits
entry storing an edge list where numeric entries represent the index of
the from and to cell in the
SpatialExperiment object. To generate such an edge list, we
need to match the cell IDs contained in the CSV against the cell IDs in
the SpatialExperiment object.
cur_pairs <- read_csv("data/steinbock/neighbors/Patient1_001.csv")
edgelist <- SelfHits(from = match(cur_pairs$Object, spe2$Object),
to = match(cur_pairs$Neighbor, spe2$Object),
nnode = ncol(spe2))
colPair(spe2, "neighborhood") <- edgelist
For further downstream analysis, we will use the
steinbock results as read in above.
After reading in the single-cell data, few further processing steps need to be taken.
Add additional metadata
We can set the colnames of the object to generate unique
identifiers per cell:
colnames(spe) <- paste0(spe$sample_id, "_", spe$ObjectNumber)
It is also often the case that sample-specific metadata are available externally. For the current data, we need to link the cancer type (also referred to as “Indication”) to each sample. This metadata is available as external excel file:
library(openxlsx)
library(stringr)
meta <- read.xlsx("data/steinbock/sample_metadata.xlsx")
spe$patient_id <- as.vector(str_extract_all(spe$sample_id, "Patient[1-4]", simplify = TRUE))
spe$ROI <- as.vector(str_extract_all(spe$sample_id, "00[1-8]", simplify = TRUE))
spe$indication <- meta$Indication[match(spe$patient_id, meta$Sample.ID)]
unique(spe$indication)
## [1] "SCCHN" "BCC" "NSCLC" "CRC"
The selected patients were diagnosed with different cancer types:
Transform counts
The distribution of expression counts across cells is often observed to be skewed towards the right side meaning lots of cells display low counts and few cells have high counts. To avoid analysis biases from these high-expressing cells, the expression counts are commonly transformed or clipped.
Here, we perform counts transformation using an inverse hyperbolic
sine function. This transformation is commonly applied to flow
cytometry data. The cofactor here defines the
expression range on which no scaling is performed. While the
cofactor for CyTOF data is often set to 5, IMC
data usually display much lower counts. We therefore apply a
cofactor of 1.
However, other transformations such as
log(counts(spe) + 0.01) should be tested when analysing IMC
data.
library(dittoSeq)
dittoRidgePlot(spe, var = "CD3", group.by = "patient_id", assay = "counts") +
ggtitle("CD3 - before transformation")
assay(spe, "exprs") <- asinh(counts(spe)/1)
dittoRidgePlot(spe, var = "CD3", group.by = "patient_id", assay = "exprs") +
ggtitle("CD3 - after transformation")
Define interesting channels
For downstream analysis such as visualization, dimensionality
reduction and clustering, only a subset of markers should be used. As
convenience, we can store an additional entry in the
rowData slot that specifies the markers of interest. Here,
we deselect the nuclear markers, which were primarily used for cell
segmentation, and keep all other biological targets.
rowData(spe)$use_channel <- !grepl("DNA|Histone", rownames(spe))
Define color schemes
We will define color schemes for different metadata entries of the
data and conveniently store them in the metadata slot of
the SpatialExperiment which will be helpful for downstream
data visualizations. We will use colors from the
RColorBrewer and dittoSeq package but any
other coloring package will suffice.
library(RColorBrewer)
color_vectors <- list()
ROI <- setNames(brewer.pal(length(unique(spe$ROI)), name = "BrBG"),
unique(spe$ROI))
patient_id <- setNames(brewer.pal(length(unique(spe$patient_id)), name = "Set1"),
unique(spe$patient_id))
sample_id <- setNames(c(brewer.pal(6, "YlOrRd")[3:5],
brewer.pal(6, "PuBu")[3:6],
brewer.pal(6, "YlGn")[3:5],
brewer.pal(6, "BuPu")[3:6]),
unique(spe$sample_id))
indication <- setNames(brewer.pal(length(unique(spe$indication)), name = "Set2"),
unique(spe$indication))
color_vectors$ROI <- ROI
color_vectors$patient_id <- patient_id
color_vectors$sample_id <- sample_id
color_vectors$indication <- indication
metadata(spe)$color_vectors <- color_vectors
The cytomapper package allows multi-channel image
handling and visualization within the Bioconductor framework. The most
common data format for multi-channel images or segmentation masks is the
TIFF file format, which is used by steinbock.
Here, we will read in multi-channel images and segmentation masks into a CytoImageList data container. It allows storing multiple multi-channel images and requires matched channels across all images within the object.
The loadImages function is used to read in processed
multi-channel images and their corresponding segmentation masks. Of
note, the multi-channel images generated by steinbock are
saved as 32-bit images while the segmentation masks are saved as 16-bit
images. To correctly scale pixel values of the segmentation masks when
reading them in set as.is = TRUE.
images <- loadImages("data/steinbock/img/")
## All files in the provided location will be read in.
masks <- loadImages("data/steinbock/masks_deepcell/", as.is = TRUE)
## All files in the provided location will be read in.
In the case of multi-channel images, it is beneficial to set the
channelNames for easy visualization. Using the
steinbock framework, the channel order of the single-cell
data matches the channel order of the multi-channel images. However, it
is recommended to make sure that the channel order is identical between
the single-cell data and the images.
channelNames(images) <- rownames(spe)
images
## CytoImageList containing 14 image(s)
## names(14): Patient1_001 Patient1_002 Patient1_003 Patient2_001 Patient2_002 Patient2_003 Patient2_004 Patient3_001 Patient3_002 Patient3_003 Patient4_005 Patient4_006 Patient4_007 Patient4_008
## Each image contains 40 channel(s)
## channelNames(40): MPO HistoneH3 SMA CD16 CD38 HLADR CD27 CD15 CD45RA CD163 B2M CD20 CD68 Ido1 CD3 LAG3 / LAG33 CD11c PD1 PDGFRb CD7 GrzB PDL1 TCF7 CD45RO FOXP3 ICOS CD8a CarbonicAnhydrase CD33 Ki67 VISTA CD40 CD4 CD14 Ecad CD303 CD206 cleavedPARP DNA1 DNA2
For image and mask visualization we will need to add additional
metadata to the elementMetadata slot of the
CytoImageList objects. This slot is easily accessible using
the mcols function.
Here, we will save the matched sample_id,
patient_id and indication information within
the elementMetadata slot of the multi-channel images and
segmentation masks objects. It is crucial that the order of the images
in both CytoImageList objects is the same.
all.equal(names(images), names(masks))
## [1] TRUE
patient_id <- str_extract_all(names(images), "Patient[1-4]", simplify = TRUE)
indication <- meta$Indication[match(patient_id, meta$Sample.ID)]
mcols(images) <- mcols(masks) <- DataFrame(sample_id = names(images),
patient_id = patient_id,
indication = indication)
An alternative way of generating a SingleCellExperiment
object directly from the multi-channel images and segmentation masks is
supported by the measureObjects
function of the cytomapper package. For each cell present
in the masks object, the function computes the mean pixel
intensity per channel as well as morphological features (area, radius,
major axis length, eccentricity) and the location of cells:
cytomapper_sce <- measureObjects(masks, image = images, img_id = "sample_id")
cytomapper_sce
## class: SingleCellExperiment
## dim: 40 47859
## metadata(0):
## assays(1): counts
## rownames(40): MPO HistoneH3 ... DNA1 DNA2
## rowData names(0):
## colnames: NULL
## colData names(10): sample_id object_id ... patient_id.V1 indication
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
The following section focuses on visualizing the single-cell data
contained in the SpatialExperiment object. The main
R/Bioconductor packages to support visualization are dittoSeq
and imcRtools
First, we will use non-linear dimensionality reduction methods to
project cells from a high-dimensional (40) down to a low-dimensional (2)
space. For this the scater
package provides the runUMAP and runTSNE
function. To ensure reproducibility, we will need to set a seed; however
different seeds and different parameter settings (e.g. the
perplexity) parameter in the runTSNE function
need to be tested to avoid interpreting visualization artefacts. For
dimensionality reduction, we will use all channels that show biological
variation across the dataset. However, marker selection can be performed
with different biological questions in mind.
library(scater)
set.seed(220225)
spe <- runUMAP(spe, subset_row = rowData(spe)$use_channel, exprs_values = "exprs")
spe <- runTSNE(spe, subset_row = rowData(spe)$use_channel, exprs_values = "exprs")
After dimensionality reduction, the low-dimensional embeddings are
stored in the reducedDim slot.
reducedDims(spe)
## List of length 2
## names(2): UMAP TSNE
head(reducedDim(spe, "UMAP"))
## [,1] [,2]
## Patient1_001_1 4.736075 -3.418426
## Patient1_001_2 4.481811 -3.180779
## Patient1_001_3 4.482297 -3.166900
## Patient1_001_4 4.227500 -2.845877
## Patient1_001_5 6.190731 -2.254569
## Patient1_001_6 5.531091 -3.464475
Visualization of the low-dimensional embedding facilitates assessment
of potential “batch effects”. The dittoDimPlot function
allows flexible visualization. It returns ggplot objects
which can be further modified.
First, we will visualize single-cell metadata such as the patient ID and the cancer type.
library(patchwork)
library(dittoSeq)
library(viridis)
# visualize patient id
p1 <- dittoDimPlot(spe, var = "patient_id", reduction.use = "UMAP", size = 0.2) +
scale_color_manual(values = metadata(spe)$color_vectors$patient_id) +
ggtitle("Patient ID on UMAP")
p2 <- dittoDimPlot(spe, var = "patient_id", reduction.use = "TSNE", size = 0.2) +
scale_color_manual(values = metadata(spe)$color_vectors$patient_id) +
ggtitle("Patient ID on TSNE")
# visualize indication
p3 <- dittoDimPlot(spe, var = "indication", reduction.use = "UMAP", size = 0.2) +
scale_color_manual(values = metadata(spe)$color_vectors$indication) +
ggtitle("Indication on UMAP")
p4 <- dittoDimPlot(spe, var = "indication", reduction.use = "TSNE", size = 0.2) +
scale_color_manual(values = metadata(spe)$color_vectors$indication) +
ggtitle("Indication on TSNE")
(p1 + p2) / (p3 + p4)
Next, we can visualize marker expression on the UMAP embedding.
# visualize marker expression
p1 <- dittoDimPlot(spe, var = "Ecad", reduction.use = "UMAP",
assay = "exprs", size = 0.2) +
scale_color_viridis(name = "Ecad") +
ggtitle("E-Cadherin expression on UMAP")
p2 <- dittoDimPlot(spe, var = "CD45RO", reduction.use = "UMAP",
assay = "exprs", size = 0.2) +
scale_color_viridis(name = "CD45RO") +
ggtitle("CD45RO expression on UMAP")
p3 <- dittoDimPlot(spe, var = "CD20", reduction.use = "UMAP",
assay = "exprs", size = 0.2) +
scale_color_viridis(name = "CD20") +
ggtitle("CD20 expression on UMAP")
p4 <- dittoDimPlot(spe, var = "CD3", reduction.use = "UMAP",
assay = "exprs", size = 0.2) +
scale_color_viridis(name = "CD3") +
ggtitle("CD3 expression on UMAP")
(p1 + p2) / (p3 + p4)
We observe a strong separation of tumor cells (Ecad+ cells) between the patients. Here, each patient was diagnosed with a different tumor type. The separation of tumor cells could be of biological origin since tumor cells tend to display differences in expression between patients and cancer types and/or of technical origin: the panel only contains a single tumor marker (E-Cadherin) and therefore slight technical differences in staining causes visible separation between cells of different patients. Nevertheless, the immune compartment (CD45RO+ cells) mix between patients and we can rule out systematic staining differences between patients.
This section focuses on visualizing the expression of all markers and highlighting variation between cells, images and patients.
First, we will visualize single-cell marker expression in form of a heatmap. Here, we sub-sample the dataset to 2000 cells for visualization purposes and overlay the cancer type from which the cells were extracted.
cur_cells <- sample(seq_len(ncol(spe)), 2000)
dittoHeatmap(spe[,cur_cells], genes = rownames(spe)[rowData(spe)$use_channel],
assay = "exprs", cluster_cols = TRUE, scale = "none",
heatmap.colors = viridis(100), annot.by = "indication",
annotation_colors = list(indication = metadata(spe)$color_vectors$indication))
We can differentiate between epithelial cells (Ecad+) and immune cells (CD45RO). Some of the markers are specifically detected (e.g., Ki67, CD20, Ecad) and others are more broadly detected (e.g. HLADR, B2M, CD4).
It can be beneficial to visualize the mean marker expression per
image to identify images with outlying marker expression. This check
does not indicate image quality per se but can highlight
biological differences. Here, we will use the
aggregateAcrossCells function of the scuttle
package to compute the mean expression per image. For visualization
purposes, we again asinh transform the mean expression
values.
library(scuttle)
image_mean <- aggregateAcrossCells(spe,
ids = spe$sample_id,
statistics="mean",
use.assay.type = "counts")
assay(image_mean, "exprs") <- asinh(counts(image_mean))
dittoHeatmap(image_mean, genes = rownames(spe)[rowData(spe)$use_channel],
assay = "exprs", cluster_cols = TRUE, scale = "none",
heatmap.colors = viridis(100),
annot.by = c("indication", "patient_id", "ROI"),
annotation_colors = list(indication = metadata(spe)$color_vectors$indication,
patient_id = metadata(spe)$color_vectors$patient_id,
ROI = metadata(spe)$color_vectors$ROI),
show_colnames = TRUE)
The data presented here originate from samples from different hospitals with potential differences in pre-processing and each sample stained individually. These (and other) technical aspects can induce staining differences between samples or batches of samples. In addition, patients were diagnosed with different cancer types. We will use ridgeline visualizations to check differences in staining patterns and biological differences in expression:
multi_dittoPlot(spe, vars = rownames(spe)[rowData(spe)$use_channel],
group.by = "patient_id", plots = "ridgeplot",
assay = "exprs",
color.panel = metadata(spe)$color_vectors$patient_id)
We observe variations in the distributions of marker expression across patients. These variations may arise partly from different abundances of cells in different images (e.g. Patient3 may have higher numbers of CD11c+ and PD1+ cells) as well as staining differences between samples. While most of the selected markers are specifically expressed in immune cell subtypes, we can see that E-Cadherin (a marker for epithelial (tumor) cells) shows similar expression across all patients.
A quality indicator for region selection is the image area covered by
cells (or biological tissue). This metric identifies regions of interest
(ROIs) where little cells are present, possibly hinting at incorrect
selection of the ROI. We can compute the percentage of covered image
area using the metadata contained in the SpatialExperiment
object:
library(dplyr)
colData(spe) %>%
as.data.frame() %>%
group_by(sample_id) %>%
summarize(cell_area = sum(area),
no_pixels = mean(width_px) * mean(height_px)) %>%
mutate(covered_area = cell_area / no_pixels) %>%
ggplot() +
geom_point(aes(reorder(sample_id,covered_area), covered_area)) +
theme_minimal(base_size = 15) +
ylim(c(0, 1)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8)) +
ylab("% covered area") + xlab("")
Next, we observe the distributions of cell size across the individual images. Differences in cell size distributions can indicate segmentation biases due to differences in cell density or can indicate biological differences due to cell type compositions (tumor cells tend to be larger than immune cells).
colData(spe) %>%
as.data.frame() %>%
group_by(sample_id) %>%
ggplot() +
geom_boxplot(aes(sample_id, area)) +
theme_minimal(base_size = 15) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8)) +
ylab("Cell area") + xlab("")
summary(spe$area)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 47.00 70.00 76.38 98.00 466.00
The median cell size is 70 pixels with a median major axis length of
11.3. The largest cell has an area of 466 pixels which relates to a
diameter of 21.6 pixels assuming a circular shape. Overall, the
distribution of cell sizes is similar across images with images from
Patient4_005 and Patient4_007 showing a
reduced average cell size. These images contain fewer tumor cells which
can explain the smaller average cell size.
We detect very small cells in the dataset and will remove them. The chosen threshold is arbitrary and needs to be adjusted per dataset.
sum(spe$area < 5)
## [1] 65
spe <- spe[,spe$area >= 5]
Here, we introduce the plotSpatial function of the imcRtools package to visualize the cells’ centroids and cell-cell interactions as spatial graphs.
In the following example, we select images of one patient for
visualization purposes. Here, each dot (node) represents a cell and
edges are drawn between cells in close physical proximity as detected by
steinbock. Nodes are variably colored based on the cells’
expression level of E-cadherin and their size can be adjusted (e.g.,
based on measured cell area).
library(ggplot2)
library(viridis)
# steinbock interaction graph
plotSpatial(spe[,grepl("Patient3", spe$sample_id)],
node_color_by = "Ecad",
assay_type = "exprs",
img_id = "sample_id",
draw_edges = TRUE,
colPairName = "neighborhood",
nodes_first = FALSE,
node_size_by = "area",
directed = FALSE,
edge_color_fix = "grey") +
scale_size_continuous(range = c(0.1, 2)) +
ggtitle("E-cadherin expression")
The following section highlights the use of the cytomapper package to visualize multichannel images and segmentation masks. For visualization purposes we select 3 images from the dataset:
# Sample images
set.seed(220517)
cur_id <- sample(unique(spe$sample_id), 3)
cur_images <- images[names(images) %in% cur_id]
cur_masks <- masks[names(masks) %in% cur_id]
The following section gives examples for visualizing individual
channels or multiple channels as pseudo-color composite images. For this
the cytomapper package exports the plotPixels
function which expects a CytoImageList object storing one
or multiple multi-channel images. In the simplest use case, a single
channel can be visualized as follows:
plotPixels(cur_images,
colour_by = "Ecad",
bcg = list(Ecad = c(0, 5, 1)))
The plot above shows the tissue expression of the epithelial tumor
marker E-cadherin on the 3 selected images. The bcg
parameter (default c(0, 1, 1)) stands for “background”,
“contrast”, “gamma” and controls these attributes of the image. This
parameter takes a named list where each entry specifies these attributes
per channel. The first value of the numeric vector will be added to the
pixel intensities (background); pixel intensities will be multiplied by
the second entry of the vector (contrast); pixel intensities will be
exponentiated by the third entry of the vector (gamma). In most cases,
it is sufficient to adjust the second (contrast) entry of the
vector.
The following example highlights the visualization of 6 markers (maximum allowed number of markers) at once per image. The markers indicate the spatial distribution of tumor cells (E-cadherin), T cells (CD3), B cells (CD20), CD8+ T cells (CD8a), plasma cells (CD38) and proliferating cells (Ki67).
plotPixels(cur_images,
colour_by = c("Ecad", "CD3", "CD20", "CD8a", "CD38", "Ki67"),
bcg = list(Ecad = c(0, 5, 1),
CD3 = c(0, 5, 1),
CD20 = c(0, 5, 1),
CD8a = c(0, 5, 1),
CD38 = c(0, 8, 1),
Ki67 = c(0, 5, 1)))
The default colors for visualization are chosen by the additive RGB
(red, green, blue) color model. For six markers the default colors are:
red, green, blue, cyan (green + blue), magenta (red + blue), yellow
(green + red). These colors are the easiest to distinguish by eye.
However, you can select other colors for each channel by setting the
colour parameter:
plotPixels(cur_images,
colour_by = c("Ecad", "CD3", "CD20"),
bcg = list(Ecad = c(0, 5, 1),
CD3 = c(0, 5, 1),
CD20 = c(0, 5, 1)),
colour = list(Ecad = c("black", "burlywood1"),
CD3 = c("black", "cyan2"),
CD20 = c("black", "firebrick1")))
The colour parameter takes a named list in which each
entry specifies the colors from which a color gradient is constructed
via colorRampPalette. These are usually vectors of length 2
in which the first entry is "black" and the second entry
specifies the color of choice. Although not recommended, you can also
specify more than two colors to generate a more complex color
gradient.
As an alternative to setting the bcg parameter, images
can first be normalized. Normalization here means to scale the pixel
intensities per channel between 0 and 1 (or a range specified by the
ft parameter in the normalize function). By
default, the normalize function scales pixel intensities
across all images contained in the
CytoImageList object (separateImages = FALSE).
Each individual channel is scaled independently
(separateChannels = TRUE).
After 0-1 normalization, maximum pixel intensities can be clipped to
enhance the contrast of the image (setting the inputRange
parameter). In the following example, the clipping to 0 and 0.2 is the
same as multiplying the pixel intensities by a factor of 5.
# 0 - 1 channel scaling across all images
norm_images <- normalize(cur_images)
# Clip channel at 0.2
norm_images <- normalize(norm_images, inputRange = c(0, 0.2))
plotPixels(norm_images,
colour_by = c("Ecad", "CD3", "CD20", "CD8a", "CD38", "Ki67"))
The default setting of scaling pixel intensities across all images ensures comparable intensity levels across images. Pixel intensities can also be scaled per image therefore correcting for staining/expression differences between images:
# 0 - 1 channel scaling per image
norm_images <- normalize(cur_images, separateImages = TRUE)
# Clip channel at 0.2
norm_images <- normalize(norm_images, inputRange = c(0, 0.2))
plotPixels(norm_images,
colour_by = c("Ecad", "CD3", "CD20", "CD8a", "CD38", "Ki67"))
As we can see, the marker Ki67 appears brighter on image 2 and 3 in comparison to scaling the channel across all images.
Finally, the normalize function also accepts a named
list input for the inputRange argument. In this list, the
clipping range per channel can be set individually:
# 0 - 1 channel scaling per image
norm_images <- normalize(cur_images,
separateImages = TRUE,
inputRange = list(Ecad = c(0, 50),
CD3 = c(0, 30),
CD20 = c(0, 40),
CD8a = c(0, 50),
CD38 = c(0, 10),
Ki67 = c(0, 70)))
plotPixels(norm_images,
colour_by = c("Ecad", "CD3", "CD20", "CD8a", "CD38", "Ki67"))
The cytomapper package provides the
plotCells function to visualize the aggregated pixel
intensities per cell on segmnetation masks. In the current dataset pixel
intensities were aggregated by computing the mean pixel intensity per
cell and per channel. The plotCells function accepts the
exprs_values argument (default counts) that
allows selecting the assay which stores the expression values that
should be visualized.
In the following example, we visualize the asinh-transformed mean pixel intensities of the epithelial marker E-cadherin on segmentation masks.
plotCells(cur_masks,
object = spe,
cell_id = "ObjectNumber", img_id = "sample_id",
colour_by = "Ecad",
exprs_values = "exprs")
We can now use the plotPixels function to outline
segmented cells on image composites to observe segmentation accuracy.
Without having ground-truth data readily available, a common approach to
segmentation quality control is to overlay segmentation masks on
composite images displaying channels that were used for
segmentation.
Here, we select 3 random images and perform image- and channel-wise normalization (channels are first min-max normalized and scaled to a range of 0-1 before clipping the maximum intensity to 0.2).
library(cytomapper)
set.seed(20220118)
img_ids <- sample(seq_len(length(images)), 3)
# Normalize and clip images
cur_images <- images[img_ids]
cur_images <- normalize(cur_images, separateImages = TRUE)
cur_images <- normalize(cur_images, inputRange = c(0, 0.2))
plotPixels(cur_images,
mask = masks[img_ids],
img_id = "sample_id",
missing_colour = "white",
colour_by = c("CD163", "CD20", "CD3", "Ecad", "DNA1"),
colour = list(CD163 = c("black", "yellow"),
CD20 = c("black", "red"),
CD3 = c("black", "green"),
Ecad = c("black", "cyan"),
DNA1 = c("black", "blue")),
image_title = NULL,
legend = list(colour_by.title.cex = 0.7,
colour_by.labels.cex = 0.7))
We can see that nuclei are centered within the segmentation masks and all cell types are correctly segmented . A common challenge here is to segment large (e.g. epithelial cells - in cyan) versus small (e.g. B cells - in red). However, the segmentation approach here appears to correctly segment cells across different sizes.
Finally, the generated data objects can be saved for further downstream processing and analysis.
saveRDS(spe, "data/spe.rds")
saveRDS(images, "data/images.rds")
saveRDS(masks, "data/masks.rds")
sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] dplyr_1.0.10 viridis_0.6.2
## [3] viridisLite_0.4.1 patchwork_1.1.2
## [5] scater_1.26.1 scuttle_1.8.2
## [7] RColorBrewer_1.1-3 dittoSeq_1.10.0
## [9] ggplot2_3.4.0 stringr_1.5.0
## [11] openxlsx_4.2.5.1 readr_2.1.3
## [13] cytomapper_1.10.0 EBImage_4.40.0
## [15] imcRtools_1.4.2 SpatialExperiment_1.8.0
## [17] SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0
## [19] Biobase_2.58.0 GenomicRanges_1.50.1
## [21] GenomeInfoDb_1.34.4 IRanges_2.32.0
## [23] S4Vectors_0.36.1 BiocGenerics_0.44.0
## [25] MatrixGenerics_1.10.0 matrixStats_0.63.0
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.2 shinydashboard_0.7.2
## [3] R.utils_2.12.2 tidyselect_1.2.0
## [5] htmlwidgets_1.5.4 grid_4.2.2
## [7] BiocParallel_1.32.4 Rtsne_0.16
## [9] DropletUtils_1.18.1 munsell_0.5.0
## [11] ScaledMatrix_1.6.0 codetools_0.2-18
## [13] units_0.8-1 DT_0.26
## [15] withr_2.5.0 colorspace_2.0-3
## [17] highr_0.9 knitr_1.41
## [19] rstudioapi_0.14 labeling_0.4.2
## [21] GenomeInfoDbData_1.2.9 polyclip_1.10-4
## [23] bit64_4.0.5 farver_2.1.1
## [25] pheatmap_1.0.12 rhdf5_2.42.0
## [27] vctrs_0.5.1 generics_0.1.3
## [29] xfun_0.35 R6_2.5.1
## [31] ggbeeswarm_0.6.0 graphlayouts_0.8.4
## [33] rsvd_1.0.5 locfit_1.5-9.6
## [35] concaveman_1.1.0 bitops_1.0-7
## [37] rhdf5filters_1.10.0 cachem_1.0.6
## [39] RTriangle_1.6-0.11 DelayedArray_0.24.0
## [41] assertthat_0.2.1 promises_1.2.0.1
## [43] scales_1.2.1 vroom_1.6.0
## [45] ggraph_2.1.0 beeswarm_0.4.0
## [47] gtable_0.3.1 beachmat_2.14.0
## [49] tidygraph_1.2.2 rlang_1.0.6
## [51] systemfonts_1.0.4 BiocManager_1.30.19
## [53] yaml_2.3.6 abind_1.4-5
## [55] httpuv_1.6.7 tools_4.2.2
## [57] ellipsis_0.3.2 raster_3.6-11
## [59] jquerylib_0.1.4 proxy_0.4-27
## [61] ggridges_0.5.4 Rcpp_1.0.9
## [63] sparseMatrixStats_1.10.0 zlibbioc_1.44.0
## [65] classInt_0.4-8 purrr_0.3.5
## [67] RCurl_1.98-1.9 cowplot_1.1.1
## [69] ggrepel_0.9.2 magrittr_2.0.3
## [71] data.table_1.14.6 magick_2.7.3
## [73] hms_1.1.2 mime_0.12
## [75] fftwtools_0.9-11 evaluate_0.19
## [77] xtable_1.8-4 jpeg_0.1-10
## [79] gridExtra_2.3 compiler_4.2.2
## [81] tibble_3.1.8 KernSmooth_2.23-20
## [83] crayon_1.5.2 R.oo_1.25.0
## [85] htmltools_0.5.4 later_1.3.0
## [87] tzdb_0.3.0 tiff_0.1-11
## [89] tidyr_1.2.1 DBI_1.1.3
## [91] tweenr_2.0.2 MASS_7.3-58.1
## [93] sf_1.0-9 BiocStyle_2.26.0
## [95] Matrix_1.5-3 cli_3.4.1
## [97] R.methodsS3_1.8.2 parallel_4.2.2
## [99] igraph_1.3.5 pkgconfig_2.0.3
## [101] sp_1.5-1 terra_1.6-47
## [103] svglite_2.1.0 vipor_0.4.5
## [105] bslib_0.4.1 dqrng_0.3.0
## [107] XVector_0.38.0 digest_0.6.31
## [109] RcppAnnoy_0.0.20 rmarkdown_2.18
## [111] uwot_0.1.14 edgeR_3.40.0
## [113] distances_0.1.8 DelayedMatrixStats_1.20.0
## [115] shiny_1.7.3 rjson_0.2.21
## [117] lifecycle_1.0.3 jsonlite_1.8.4
## [119] Rhdf5lib_1.20.0 BiocNeighbors_1.16.0
## [121] limma_3.54.0 fansi_1.0.3
## [123] pillar_1.8.1 lattice_0.20-45
## [125] fastmap_1.1.0 glue_1.6.2
## [127] zip_2.2.2 png_0.1-8
## [129] svgPanZoom_0.3.4 bit_4.0.5
## [131] ggforce_0.4.1 class_7.3-20
## [133] stringi_1.7.8 sass_0.4.4
## [135] HDF5Array_1.26.0 nnls_1.4
## [137] BiocSingular_1.14.0 irlba_2.3.5.1
## [139] e1071_1.7-12